A Study for Evaluating the Importance of Various Parts of Speech (POS) for Information Retrieval (IR)

نویسنده

  • Chirag Shah
چکیده

Traditionally in the vector space model of document representation for various IR (Information Retrieval) tasks, all the content words are used without considering their individual significance in the language. Such methods treat a document as a bag-of-words and do not exploit any language related information. It is obvious that considering such information in representing the documents can help in improving the performance of various IR tasks, but how to obtain this information is considered to be difficult. One of the information that can be important is the knowledge about the role of various parts-of-speech (POS). Although importance of various POS is very subjective and depends on the application as well as the domain under consideration, it can be very useful to evaluate their importance even in a general setup. In this paper we present a study to understand this importance. We first generate the document vectors using particular POS. We then evaluate how good is this representation. This is done by measuring the information content provided by document vectors. This information is then used to reconstruct the document vectors. In order to show that these document vectors are better than those of generated by traditional methods, we consider text classification application. We show some improvement in classification accuracy, but more importantly, we demonstrate the consistency in the results and a step toward a new and promising direction for using semantics for IR tasks.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

بررسی مقایسه‌ای تأثیر برچسب‌زنی مقولات دستوری بر تجزیه در پردازش خودکار زبان فارسی

In this paper, the role of Part-of-Speech (POS) tagging for parsing in automatic processing of the Persian language is studied. To this end, the impact of the quality of POS tagging as well as the impact of the quantity of information available in the POS tags on parsing are studied. To reach the goals, three parsing scenarios are proposed and compared. In the first scenario, the parser assigns...

متن کامل

Examining the Content Load of Part of Speech Blocks for Information Retrieval

We investigate the connection between part of speech (POS) distribution and content in language. We define POS blocks to be groups of parts of speech. We hypothesise that there exists a directly proportional relation between the frequency of POS blocks and their content salience. We also hypothesise that the class membership of the parts of speech within such blocks reflects the content load of...

متن کامل

An improved joint model: POS tagging and dependency parsing

Dependency parsing is a way of syntactic parsing and a natural language that automatically analyzes the dependency structure of sentences, and the input for each sentence creates a dependency graph. Part-Of-Speech (POS) tagging is a prerequisite for dependency parsing. Generally, dependency parsers do the POS tagging task along with dependency parsing in a pipeline mode. Unfortunately, in pipel...

متن کامل

A Linguistic Method into Stemming of Arabic for Data Compression

Creating good stemming rules for the Arabic language comes from the importance of Arabic language as the sixth most used language in the word. Stemming is very important in information retrieval, data mining and language processing. With Arabic having complex morphology and grammatical properties, this poses a challenge for researchers in this field. In this paper, we try to use an online morph...

متن کامل

Parts Of Speech Tagging for Indian Languages: A Literature Survey

Part of speech (POS) tagging is the process of assigning the part of speech tag or other lexical class marker to each and every word in a sentence. In many Natural Language Processing applications such as word sense disambiguation, information retrieval, information processing, parsing, question answering, and machine translation, POS tagging is considered as the one of the basic necessary tool...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002